中国邮电高校学报(英文) ›› 2014, Vol. 21 ›› Issue (1): 79-85.doi: 10.1016/S1005-8885(14)60272-7

• Artificial Intelligence • 上一篇    下一篇

Data streams classification with ensemble model based on decision-feedback

刘敬,徐国胜,郑世慧,肖达,谷利泽   

  1. 1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • 收稿日期:2013-04-22 修回日期:2014-01-03 出版日期:2014-02-28 发布日期:2014-02-28
  • 通讯作者: 刘敬 E-mail:liujing81@sohu.com
  • 基金资助:

    This work was supported by the National Natural Science Foundation of China (61202082), the Fundamental Research Funds for the Central Universities (BUPT2012RC0218, BUPT2012RC0219).

Data streams classification with ensemble model based on decision-feedback

  1. 1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2013-04-22 Revised:2014-01-03 Online:2014-02-28 Published:2014-02-28
  • Supported by:

    This work was supported by the National Natural Science Foundation of China (61202082), the Fundamental Research Funds for the Central Universities (BUPT2012RC0218, BUPT2012RC0219).

摘要:

The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback (ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.

关键词:

ensemble classification, novel class, concept drifting, decision-feedback

Abstract:

The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback (ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.

Key words:

ensemble classification, novel class, concept drifting, decision-feedback

中图分类号: